184 research outputs found

    Synchronization recovery and state model reduction for soft decoding of variable length codes

    Get PDF
    Variable length codes exhibit de-synchronization problems when transmitted over noisy channels. Trellis decoding techniques based on Maximum A Posteriori (MAP) estimators are often used to minimize the error rate on the estimated sequence. If the number of symbols and/or bits transmitted are known by the decoder, termination constraints can be incorporated in the decoding process. All the paths in the trellis which do not lead to a valid sequence length are suppressed. This paper presents an analytic method to assess the expected error resilience of a VLC when trellis decoding with a sequence length constraint is used. The approach is based on the computation, for a given code, of the amount of information brought by the constraint. It is then shown that this quantity as well as the probability that the VLC decoder does not re-synchronize in a strict sense, are not significantly altered by appropriate trellis states aggregation. This proves that the performance obtained by running a length-constrained Viterbi decoder on aggregated state models approaches the one obtained with the bit/symbol trellis, with a significantly reduced complexity. It is then shown that the complexity can be further decreased by projecting the state model on two state models of reduced size

    Particular object retrieval with integral max-pooling of CNN activations

    Get PDF
    Recently, image representation built upon Convolutional Neural Network (CNN) has been shown to provide effective descriptors for image search, outperforming pre-CNN features as short-vector representations. Yet such models are not compatible with geometry-aware re-ranking methods and still outperformed, on some particular object retrieval benchmarks, by traditional image search systems relying on precise descriptor matching, geometric re-ranking, or query expansion. This work revisits both retrieval stages, namely initial search and re-ranking, by employing the same primitive information derived from the CNN. We build compact feature vectors that encode several image regions without the need to feed multiple inputs to the network. Furthermore, we extend integral images to handle max-pooling on convolutional layer activations, allowing us to efficiently localize matching objects. The resulting bounding box is finally used for image re-ranking. As a result, this paper significantly improves existing CNN-based recognition pipeline: We report for the first time results competing with traditional methods on the challenging Oxford5k and Paris6k datasets

    Balancing clusters to reduce response time variability in large scale image search

    Get PDF
    Many algorithms for approximate nearest neighbor search in high-dimensional spaces partition the data into clusters. At query time, in order to avoid exhaustive search, an index selects the few (or a single) clusters nearest to the query point. Clusters are often produced by the well-known kk-means approach since it has several desirable properties. On the downside, it tends to produce clusters having quite different cardinalities. Imbalanced clusters negatively impact both the variance and the expectation of query response times. This paper proposes to modify kk-means centroids to produce clusters with more comparable sizes without sacrificing the desirable properties. Experiments with a large scale collection of image descriptors show that our algorithm significantly reduces the variance of response times without seriously impacting the search quality

    Orientation covariant aggregation of local descriptors with embeddings

    Get PDF
    Image search systems based on local descriptors typically achieve orientation invariance by aligning the patches on their dominant orientations. Albeit successful, this choice introduces too much invariance because it does not guarantee that the patches are rotated consistently. This paper introduces an aggregation strategy of local descriptors that achieves this covariance property by jointly encoding the angle in the aggregation stage in a continuous manner. It is combined with an efficient monomial embedding to provide a codebook-free method to aggregate local descriptors into a single vector representation. Our strategy is also compatible and employed with several popular encoding methods, in particular bag-of-words, VLAD and the Fisher vector. Our geometric-aware aggregation strategy is effective for image search, as shown by experiments performed on standard benchmarks for image and particular object retrieval, namely Holidays and Oxford buildings.Comment: European Conference on Computer Vision (2014

    Low-shot learning with large-scale diffusion

    Full text link
    This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime

    Circulant temporal encoding for video retrieval and temporal alignment

    Get PDF
    We address the problem of specific video event retrieval. Given a query video of a specific event, e.g., a concert of Madonna, the goal is to retrieve other videos of the same event that temporally overlap with the query. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to efficiently compare the videos in the frequency domain. This offers a significant gain in complexity and accurately localizes the matching parts of videos. The descriptors can be compressed in the frequency domain with a product quantizer adapted to complex numbers. In this case, video retrieval is performed without decompressing the descriptors. We also consider the temporal alignment of a set of videos. We exploit the matching confidence and an estimate of the temporal offset computed for all pairs of videos by our retrieval approach. Our robust algorithm aligns the videos on a global timeline by maximizing the set of temporally consistent matches. The global temporal alignment enables synchronous playback of the videos of a given scene

    Memory vectors for similarity search in high-dimensional spaces

    Get PDF
    We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory. This architecture is composed of several memory units, each of which summarizes a fraction of the database by a single representative vector. The potential similarity of the query to one of the vectors stored in the memory unit is gauged by a simple correlation with the memory unit's representative vector. This representative optimizes the test of the following hypothesis: the query is independent from any vector in the memory unit vs. the query is a simple perturbation of one of the stored vectors. Compared to exhaustive search, our approach finds the most similar database vectors significantly faster without a noticeable reduction in search quality. Interestingly, the reduction of complexity is provably better in high-dimensional spaces. We empirically demonstrate its practical interest in a large-scale image search scenario with off-the-shelf state-of-the-art descriptors.Comment: Accepted to IEEE Transactions on Big Dat

    Codes robustes et codes joints source-canal pour transmission multimédia sur canaux mobiles

    Get PDF
    Some new error-resilient source coding and joint source/channel coding techniquesare proposed for the transmission of multimedia sources over error-prone channels.First, we introduce a class of entropy codes providing unequal error-resilience, i.e.providing some protection to the most sensitive information. These codes are thenextended to exploit the temporal dependencies. A new state model based on the aggregation of some states of the trellis is thenproposed and analyzed for soft source decoding of variable length codes with a lengthconstraint. It allows the weighting of the compromise between the estimation accuracyand the decoding complexity.Next, some paquetization methods are proposed to reduce the error propagationphenomenon of variable length codes.Finally, some re-writing rules are proposed to extend the binary codetree representationof entropy codes. The proposed representation allows in particular the designof codes with improved soft decoding performances.Cette thèse propose des codes robustes et des codes conjoints source/canal pourtransmettre des signaux multimédia sur des canaux bruités. Nous proposons des codesentropiques offrant une résistance intrinsèque aux données prioritaires. Ces codes sontétendus pour exploiter la dépendance temporelle du signal.Un nouveau modèle d’état est ensuite proposé et analysé pour le décodage souplede codes à longueur variable avec une contrainte de longueur. Il permet de réglerfinement le compromis performance de décodage/complexité.Nous proposons également de séparer, au niveau du codage entropique, les étapesde production des mots de codes et de paquétisation. Différentes stratégies de constructionde train binaire sont alors proposées.Enfin, la représentation en arbre binaire des codes entropiques est étendue enconsidérant des règles de ré-écriture. Cela permet en particulier d’obtenir des codesqui offrent des meilleures performances en décodage souple
    • …
    corecore